AITopics | audio sample

Collaborating Authors

audio sample

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Neural Voice Cloning with a Few Samples

Sercan Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou

Neural Information Processing SystemsFeb-12-2026, 17:28:44 GMT

Neural Information Processing Systems http://nips.cc/

adaptation, generative model, speaker adaptation, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Sunnyvale (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia (0.04)

Industry: Information Technology > Security & Privacy (0.53)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

A Appendix for OPERA Contents

Neural Information Processing SystemsFeb-10-2026, 14:49:44 GMT

The five datasets used cover a wide range of respiratory medical conditions.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Asia > Taiwan (0.05)
Europe > Portugal > Aveiro > Aveiro (0.04)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)
(7 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Physics-Guided Deepfake Detection for Voice Authentication Systems

Mohammadi, Alireza, Sood, Keshav, Thiruvady, Dhananjay, Nazari, Asef

arXiv.org Artificial IntelligenceDec-9-2025

Abstract--V oice authentication systems deployed at the network edge face dual threats: a) sophisticated deepfake synthesis attacks and b) control-plane poisoning in distributed federated learning protocols. We present a framework coupling physics-guided deepfake detection with uncertainty-aware in edge learning. The representations are then processed via a Multi-Modal Ensemble Architecture, followed by a Bayesian ensemble providing uncertainty estimates. Incorporating physics-based characteristics evaluations and uncertainty estimates of audio samples allows our proposed framework to remain robust to both advanced deepfake attacks and sophisticated control-plane poisoning, addressing the complete threat model for networked voice authentication. DV ANCED neural speech deepfake generation has fundamentally transformed voice authentication security.

artificial intelligence, detection, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2512.0604

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

HarmonicAttack: An Adaptive Cross-Domain Audio Watermark Removal

Li, Kexin, Hu, Xiao, Grishchenko, Ilya, Lie, David

arXiv.org Artificial IntelligenceNov-27-2025

The availability of high-quality, AI-generated audio raises security challenges such as misinformation campaigns and voice-cloning fraud. A key defense against the misuse of AI-generated audio is by watermarking it, so that it can be easily distinguished from genuine audio. As those seeking to misuse AI-generated audio may thus seek to remove audio watermarks, studying effective watermark removal techniques is critical to being able to objectively evaluate the robustness of audio watermarks against removal. Previous watermark removal schemes either assume impractical knowledge of the watermarks they are designed to remove or are computationally expensive, potentially generating a false sense of confidence in current watermark schemes. We introduce HarmonicAttack, an efficient audio watermark removal method that only requires the basic ability to generate the watermarks from the targeted scheme and nothing else. With this, we are able to train a general watermark removal model that is able to remove the watermarks generated by the targeted scheme from any watermarked audio sample. HarmonicAttack employs a dual-path convolutional autoencoder that operates in both temporal and frequency domains, along with GAN-style training, to separate the watermark from the original audio. When evaluated against state-of-the-art watermark schemes AudioSeal, WavMark, and Silentcipher, HarmonicAttack demonstrates greater watermark removal ability than previous watermark removal methods with near real-time performance. Moreover, while HarmonicAttack requires training, we find that it is able to transfer to out-of-distribution samples with minimal degradation in performance.

artificial intelligence, harmonicattack, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2511.21577

Genre: Research Report > New Finding (1.00)

Industry:

Media (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

New AI technique sounding out audio deepfakes

AIHubNov-21-2025, 12:46:33 GMT

Researchers from Australia's national science agency CSIRO, Federation University Australia and RMIT University have developed a method to improve the detection of audio deepfakes. The new technique, Rehearsal with Auxiliary-Informed Sampling (RAIS), is designed for audio deepfake detection -- a growing threat in cybercrime risks such as bypassing voice-based biometric authentication systems, impersonation and disinformation. It determines whether an audio clip is real or artificially generated (a'deepfake') and maintains performance over time as attack types evolve. In Italy earlier this year, an AI-cloned voice of its Defence Minister requested a €1M'ransom' from prominent business leaders, convincing some to pay. This is just one of many examples, highlighting the need for audio deepfake detectors.

artificial intelligence, deepfake, machine learning, (11 more...)

AIHub

Country: Oceania > Australia (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

SING: Symbol-to-Instrument Neural Generator

Alexandre Defossez, Neil Zeghidour, Nicolas Usunier, Leon Bottou, Francis Bach

Neural Information Processing SystemsNov-20-2025, 16:36:14 GMT

These embeddings are decoded by a single four-layer convolutional network to generate notes from nearly 1000 instruments, 65 pitches per instrument on average and 5 velocities.

artificial intelligence, machine learning, waveform, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > France > Île-de-France > Paris > Paris (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Add feedback

Neural Voice Cloning with a Few Samples

Sercan Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou

Neural Information Processing SystemsNov-20-2025, 16:03:29 GMT

V oice cloning is a highly desired feature for personalized speech interfaces.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Sunnyvale (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia (0.04)

Industry: Information Technology > Security & Privacy (0.53)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Fine-tuning Pre-trained Audio Models for COVID-19 Detection: A Technical Report

de Brito, Daniel Oliveira, de Souza, Letícia Gabriella, Gauy, Marcelo Matheus, Finger, Marcelo, Junior, Arnaldo Candido

arXiv.org Artificial IntelligenceNov-20-2025

This technical report investigates the performance of pre-trained audio models on COVID-19 detection tasks using established benchmark datasets. We fine-tuned Audio-MAE and three PANN architectures (CNN6, CNN10, CNN14) on the Coswara and COUGHVID datasets, evaluating both intra-dataset and cross-dataset generalization. We implemented a strict demographic stratification by age and gender to prevent models from exploiting spurious correlations between demographic characteristics and COVID-19 status. Intra-dataset results showed moderate performance, with Audio-MAE achieving the strongest result on Coswara (0.82 AUC, 0.76 F1-score), while all models demonstrated limited performance on Coughvid (AUC 0.58-0.63). Cross-dataset evaluation revealed severe generalization failure across all models (AUC 0.43-0.68), with Audio-MAE showing strong performance degradation (F1-score 0.00-0.08). Our experiments demonstrate that demographic balancing, while reducing apparent model performance, provides more realistic assessment of COVID-19 detection capabilities by eliminating demographic leakage - a confounding factor that inflate performance metrics. Additionally, the limited dataset sizes after balancing (1,219-2,160 samples) proved insufficient for deep learning models that typically require substantially larger training sets. These findings highlight fundamental challenges in developing generalizable audio-based COVID-19 detection systems and underscore the importance of rigorous demographic controls for clinically robust model evaluation.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.14939

Country: South America > Brazil (0.15)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AudioMarkBench: Benchmarking Robustness of Audio Watermarking

Neural Information Processing SystemsNov-18-2025, 21:43:31 GMT

The increasing realism of synthetic speech, driven by advancements in text-to-speech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding human-imperceptible watermarks into AI-generated audios.

artificial intelligence, machine learning, perturbation, (15 more...)

Neural Information Processing Systems

Country: